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LIKELIHOOD APPROACH FOR MARGINAL PROPORTIONAL 
HAZARDS REGRESSION IN THE PRESENCE OF DEPENDENT 

CENSORING 1 

By Donglin Zeng 

University of North Carolina at Chapel Hill 

In many public health problems, an important goal is to identify 
the effect of some treatment /intervention on the risk of failure for the 
whole population. A marginal proportional hazards regression model 
is often used to analyze such an effect. When dependent censoring 
is explained by many auxiliary covariates, we utilize two working 
models to condense high-dimensional covariates to achieve dimension 
reduction. Then the estimator of the treatment effect is obtained by 
maximizing a pseudo-likelihood function over a sieve space. Such an 
estimator is shown to be consistent and asymptotically normal when 
either of the two working models is correct; additionally, when both 
working models are correct, its asymptotic variance is the same as 
the semiparametric efficiency bound. 

1. Introduction. In many public health problems, an important goal is 
to study the effect of some treatment or intervention on the risk of failure. 
A commonly used model to analyze such an effect is via the proportional 
hazards regression model: 

(1-1) h Tlv (t\v)=X(t)e a \ 

where V denotes the measurement of treatment, T denotes failure time and 
h T \y(t\v) denotes the hazard rate function of T given V. In the model (1.1), 
X(t) is an unknown baseline hazard rate function and a is an unknown 
parameter describing the effect of V. A marginal regression model such as 
(1.1) is often useful in public health problems, since in that field the scientific 
goal is to identify the effect of treatment for the whole population regardless 
of heterogeneity within the population; in other words, we would not adjust 
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for other covariates in the regression model (1.1) even if such covariates 
are measured at the same time of data collection. Some other reasons why 
additional covariates would not be adjusted for in the regression model for 
epidemiologic studies can be seen in Robins, Rotnitzky and Zhao (1994). 

Dependent right-censoring is common in failure time data, where subjects 
may drop out or be censored during the studies. The censorship can be 
caused by many factors, such as the feeling of patients about participation 
in the studies, the social supports for patients, patients' accessibility to the 
studies, biological information of patients, and so on. In practice, when a 
large amount of such information is collected, it is safe to assume that the 
dependence between the failure time and the censoring time is fully explained 
by all the collected covariates. In mathematical notation, if we denote C as 
censoring time and denote X as other auxiliary covariates besides V, then 
we assume that, conditional on X and V, T and C are independent. 

Suppose n i.i.d. right-censored observations are available and we denote 
them as (Y$ = Tj A Cj, Ri = I{Ti < Q), Xi,Vi), i = l,...,n. Our goal is to 
estimate the treatment effect a in the model (1.1). It is well known that, 
in the presence of dependent censoring, simply performing the Cox regres- 
sion using V as covariates gives an inconsistent estimate. In order to adjust 
for dependent censoring, one intuitive approach tends to estimate the dis- 
tribution of T given (X, V) either nonparametrically or semiparametrically. 
However, two weak points can restrain the use of this approach: one is that 
nonparametric estimation is not feasible with moderate samples when X has 
more than three dimensions, which is known as the curse of the dimension- 
ality; the other is that many semiparametric models of T given (X, V) are 
generally not compatible with (1.1) while the latter, as indicated above, is of 
main scientific interest. Recently, an estimating equation approach was pro- 
posed by Robins, Rotnitzky and Zhao (1994) and was successfully applied 
to missing longitudinal data; however, to our knowledge, such an approach 
has not been applied to regression problems for survival data, except for 
a brief discussion in Robins, Rotnitzky and Zhao (1994). Furthermore, the 
implementation of the estimating equation approach relies on the derivation 
of the efficient score function for a, which is implicit and difficult for the 
model (1.1). 

In this paper, we propose a likelihood-based approach to estimate the 
parameters in the marginal proportional hazards model (1.1). The ideas of 
handling dependent censoring are similar to those in one of our previous 
papers [Zeng (2004)] . Briefly, we first condense the high-dimensional covari- 
ates (X, V) by utilizing two working models for the distribution of T given 
(X, V) and the distribution of C given (X, V). Then an estimate for the co- 
efficient a in (1.1) is obtained by maximizing a pseudo- likelihood function of 
a reduced datum, which consists of the observed event times, the censoring 
status, the treatment and the condensed covariates. In the maximization, 
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the nuisance parameters for a are profiled out over a sieve space consisting 
of B-splines. At the end of this paper we demonstrate that the estimator for 
a has the following properties: if either of the two working models is cor- 
rect, the estimator is consistent and asymptotically normal; if both working 
models are correct, the estimator's asymptotic variance attains the semi- 
parametric efficiency bound. The first property is called double robustness 
by Robins, Rotnitzky and van der Laan (2000). The details of the proofs are 
given in the Appendix. 

2. Estimation. For convenience, we denote fz 1 \z 2 ('\') as the conditional 
density of a random vector Z\ given another random vector Z<i and abbre- 
viate {X T ,V) T as W. 

2.1. Estimation procedure. First we utilize two working models for the 
distribution of T and C given W. 

Working Model 1. We tentatively assume that T is independent of 
X given V so f T \w{v\ w ) = fr\v(.y\v)- 

Working Model 2. We tentatively assume that the model of C given 
W is a proportional hazards model, that is, h c \ w (y\w) = h c (y)e~< w for an 
unknown vector 7 and an unknown baseline hazard rate function h c (-). 

Remark. In fact, any model can be used for Working Model 1 and there 
are two reasons for us to choose the current form: first, this is a simple one 
to work with; second, our later results show that, to ensure our proposed 
estimator is more likely to be consistent, such a working model has to satisfy 
the constrained form in (1.1). Obviously, the current Working Model 1 is the 
most convenient choice. 

To illustrate the estimation procedure, we suppose that either working 
model is correct and that 7 is a known constant. We let [a(j, v), 6(7, v)] be 
the support of the conditional distribution of "y T W given V = v and define 

U(i) = h T W - 0(7, V)]/[b( 7 , V) - o( 7> V)] 

for fixed 7. Then the conditional distribution of U (7) given V has support 
[0,1]. As shown in Lemma 3.1 of Zeng (2004), T and C are independent 
given ([7(7), V) when either working model is correct; in other words, the 
dependence between T and C can be fully explained by the two-dimensional 
condensed information (£7(7), V). We replace the observed statistics W with 
(£7(7), V) and obtain reduced data (Yi, Ri, Ui, Vi), where U{ = j T Wi, i = 
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1, ...,n. Therefore, the observed likelihood function of the reduced data 
concerning the joint distribution of (T,U(j)) given V is 



8=1 



UllfTMwmUuVi)] 



Ri 



y, 



fT\u(-y),v(s\Ui,Vi)ds 



l-Ri 



fu(j)\v(Ui\Vi] 



In order to absorb the marginal model (1.1) into the observed likelihood 
function, a natural reparameterization is to use the conditional density of 
U(j) given T and V and the conditional density of T given V as the new 
parameters. The latter contains the parameters A(y) and a. However, since 
T is only observable in [0, r) where r is the end time of the study, the 
conditional density of U(j) given T and V is not identifiable for T > r. 
Therefore, we introduce a modified new variable T = TI(T < r) +tI(T > r); 
that is, T is the same as T if T is observed within the study time frame and 
T is equal to r if T is out of the observable range. Then it is easy to calculate 
the density function for T given V = v as 

I{t < T )\{t)e av ^{-e av K{t)} + S(t = t) expj-e^Atr)}, 

where <5(-) is the Dirac function. Moreover, we denote fu(^){'\Ui v ) as the 
conditional density of U{^) given T = y and V = v for y £ (0,r) and de- 
note 5'r/(7) ( u \ T i v ) as t ne conditional density of U (7) given T = r and V = v. 
Thus, fu{-i){'\v-> v ) is the same as the conditional density of U(j) given 
T = y and V = v and guM{'\ T i v ) is the same as the conditional density 
of U(j) given T > r and V = v. Since the observed data are equivalent 
to (Ui,Vi,Ri = I(Ti < Ci),Yi = Ti A Cj), in terms of the new parameters 
(a, X(y), fu^)(u\y,v), gu^)(u\T,v)) the observed likelihood function can be 
written as 



n|[exp{-e^A(y 4 )}e^A(y j )/ C /( 7 )(^l^,^)] 



Ri 



i=l 

(2.1) x 



y, 



exp{- e ^A( S )}e aVi A( S )/ C7(7) (^|s, K) ds 



+ exp{-e aV *A(T)}g ub) (U i \T,V i ) 



l-Ri 



Clearly, all the parameters are distinct and identifiable. 

Finally, we maximize the function (2.1) over a sieve space of the pa- 
rameters (a, X(y), fuM { u \Vi v )i 9uM { u \ T i v )) t° r some estimate of 7. In the 
following sections, we describe how to obtain an estimate of 7 and how to 
construct a sieve space for the parameters. 
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2.2. An estimate for 7. We estimate 7 by performing the proportional 
hazards regression using the censored observations. That is, we maximize 
the following pseudo-partial likelihood function for 7: 

n 



=1 CLy^y, e ' 

The estimator for 7 is denoted as j n . As shown in Theorem 3.1 of Zeng 
(2004), under some regular conditions 7„ should converge to a constant 7* 
almost surely and \A^(7n ~~ 7* ) nas an asymptotically linear expansion with 
its influence function denoted by S(Y, R, W;^*). 

2.3. Sieve space S n for the parameters (a, A(y), Jr/( 7 ) (tj|t/, tj), 9u(-y)( u \ T i v ))- 
We propose a sieve space consisting of B-splines for fu^(u\y, v), gu^(u\r, v) 
and A(y) in maximizing (2.1). We suppose that < V < 1 and that |a| < M 
for a known constant M. 

We reparameterize (/c/-(-y)(ti|j/, v),5i/(-y)(w|r, i>), A(y)) by introducing 

exp{r7i(n, y,7j)} 

Jo exp{r/i(u,y,u)}du 

exp{r? 2 (n,7;)} 

9u{j){u\r,v) = -j — — , 

Jo exp{r/ 2 (u,?;)}(iii 

and A(y) = exp{£(y)}, where 771(71, y, v) and 772(7/, tj) satisfy that 771(0, y, f ) = 0, 
772(0,1") =0. After the reparameterization, the new parameters are (a,£(y), 
rji(u,y,v), 772(71,7;)) in which 0<u, u < 1, < y < t. A sieve space consisting 
of B-splines is defined for these new parameters as follows: First, we obtain 
an extended partition with equal length 1/K n for the interval [0,1]: 

A e = {S- m = ■ ■ ■ = S_l = = S < Sl < • ■ • < S Kn = ! = ■■■ = S Kn +m}, 

where m (independent of n) and K n are two integers to be chosen later. Let 
{N™(s)}f=i~ m be a normalized B-spline basis associated with A e [cf. Schu- 
maker (1981)]. Then the sieve space for the parameters (a, £(y), 771(7/, y, tj), 772(7^ 
is defined as 

S n (m,K n ,M n ) 

= ^(u,£{y),m{ u ,y, v ),V2(u,v)) •. \a\ < M, 

m+K n 

m (u,y,v)= Yl r)l i2ji3 N™(u)N™(y/T)N™(v), 



(2.2) 
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m+K n m+K n 

V2 (u,v)= E rg uia NZ(u)NZ(v),t(v)= E tiNTivM, 

11,12=1 2=1 

m+K n m+K n m+K„ 

E Ki 2 ,i 3 l< M - E l^i,i 2 l<^n, E M<M n , 
i\ ,12 ^3 — 1 ii )*2— 1 z=l 

m+K n m+K n "| 

E ^A,i.^(0)=0, E ^ 2 ^(0) = . 

il = 1 *i = l J 

In other words, we use a finite linear combination of the B-splines to approx- 
imate each nonparametric function. The use of the last two constraints in 
the conditions of the sieve space ensures that 771(0, y,v) = and 772(0, v) = 0. 
The constants M n and K n depend on n and will be chosen later. 

2.4. Maximization. Let P n ,P denote the empirical measure and the 
true probability measure of (Y, R, W), respectively, and let U = [t^W — 
a (ln, V)]/[b{ln,V) — a(j n , V)}. We maximize the function 



Pjfllog 



exp< — / e 



;i-i?)io g 



Y „a(s) +a v ds \ e t(Y)+ a v _ exp{?7i(L> , Y, V)} 



Jq exp{r]i(u,Y,V)}du 



(2.3) 



cxp 

Y 



s 



£(s')+aV ds > [^(s)+aV 

exp{r7i (U,s,V)} 



■ ds 



jQexp{rj 1 (u,s,V)}du 

l Jo J J Q ex.p{r/2(u,V)} du. 

over the sieve space S n (m, K n , M n ). One possible choice of (m, K n , M n ) is 
(k + 2,MnP,My/\ogn) for some given constant M, a known integer k > 11 
and a constant f3 satisfying ^ < (5 < 4fc ^_ 9 - 

It will be shown later that \a(ff n ,v) — b(fy n ,v)\ has a positive limit for 
any v with probability 1. Then the arguments of the maximum exist since 
we are maximizing the function over a compact set in a finite-dimensional 
space. However, the solution itself may not be unique. We simply select any 
one of these maximizers and denote it as (& n ,£ n (y),f)i n (u,y,v),f)2n(u,v)). 
Respectively, we obtain the estimators a n = a n , X n (y) = exp{^ n (y)} and 

i 1 I \ exp{r/i n (n,y,7;)} exp{r} 2 n(^, v)} 

fn{u\y,v) = - 1 — — — — , g n (u\T,v) = - r 



J exp{f) ln (u,y,v)}du J exp{fj 2n {u,v)} du 

Computationally, many constrained optimization algorithms such as the 
quasi-Newton method, combined with the use of either a penalty or a barrier 
function, can be applied to find the arguments of the maximization. 
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3. Asymptotic results. We provide the main results in this section. Es- 
pecially, the consistency and asymptotic distribution for a n are derived. The 
proofs for all the theorems are given in Section 6. 

3.1. Assumptions. In addition to the assumption that T and C are in- 
dependent given W, we need the following conditions. 

Assumption Al. V has support in [0,1] and X has bounded support 
in R d where d is the dimension of X. Moreover, if there exist a constant cq 
and a constant vector 7 such that ^ T W = cq almost surely, then cq = and 
7 = 0. 

Assumption A2. With probability 1, there exists a positive constant 
9 such that P(C > t\W) = P{C = t\W) > 9 and P(T > t\W) > 9 . That 
is, at least some subjects do not fail at the end time r and by definition they 
are considered to be right-censored at r. 

Assumption A3. For a known integer k > 11, the conditional density 
of X given T and V, denoted as f x ^f v , and the true baseline hazard rate, 
A (y), satisfy 

logf xl¥y (x\y,v) G W k+4 ' 2 (R d+2 ), logAo(y) G W k+4 > 2 (R), 

after appropriate extension to the whole space. Here, W k+4 ' 2 (R l ) is a Sobolev 
space consisting of the functions with (fc + 4)th derivatives in L2(R l ). Fur- 
thermore, we assume that 

log f c \ w (y\w) GW k+4 > 2 ((0,r) x R^ 1 ), 

log P (C = t\W = w) eW k+i ' 2 {R d+1 ). 

Assumption A4. There exists a known constant M such that the true 
treatment effect «o satisfies | ckq | < M. Moreover, the equation 



P[(1-R)W] = p{(1-R) 



P[I Y > y We 



P[I Y > y e^] 



y=Y 



has a unique solution 7* in [— M, M] d+l . In addition, for any 7 in a small 
neighborhood O of 7* , the conditional distribution of j T W given V = v 
has support [0(7, v), 6(7, v)] satisfying: both the function a(-) and the func- 
tion b(-) are two known functions and they are continuously differentiable 
with respect to 7; as functions of v, a(j,v) and 6(7, v) belong to W k+A,2 (R); 
min„ 6[ o,i] )7e o \b(j,v) -a(j,v)\ > 0. 
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Assumption A5. (M n ,K n ) satisfy M n ,K n — > oo and 
e 13M " e 16M "iff /3+3 (logif n .) 2 



Assumption A6. K n satisfies y/n = o(K^ k ). 



Remark. Theorem 3.1 of Zeng (2004) showed that the asymptotic limit 
of j n is equal to 7* given by Assumption A4. It is also implied by Assump- 
tion A4 that one of the first d components of 7* is nonzero. Thus, if we sup- 
pose the first component of 7* = (7*, . . . ,7^,7^ +1 ) is not zero, then in terms 
of f x .~ v (x\y, v), the conditional density of U* = 7* T VF given (T = y,V = v), 

which is denoted as fu*(u\y,v) for T < r and as gu*{u\r, v) for T = r, can 
be expressed as 

|7i*| J /x|T H 7! 

x 2 ,...,x d \y,vj dx 2 ■■■ dx d , 

where A(v) = 6(7*, v) — 0(7*, v). Hence, Assumption A3 implies that fu* (u\y, v) 
and gjj* (u\t, v) are bounded away from and their (k + 4)th derivatives are 
also L2-integrable. Furthermore, by the embedding theorem in Sobolev space 
[cf. Adams (1975)], this gives that each of \ogfu*\Ty{u\y,v), log gy* (u\t, v ), 

\°gfc\u*y{y\ u i v )i log^o(y) is in W k '°° space; that is, their kth derivatives 
are bounded essentially. 



Remark. Assumptions A5 and A6 determine the size of the sieve space 
in terms of the number of knots in the partition (K n ) and the bounds of the 
sieve functions (M n ). When A; > 11, such K n satisfying both Assumptions 
A5 and A6 exists. For example, we can choose K n = n^, ^ < (3 < 4fc 3 ) _ 9 . 
Additionally, the choice of M n can be of order ^/logra. 



Although all these assumptions guarantee the validity of the following 
arguments, they are not minimal assumptions. 



3.2. Asymptotic results. 



Theorem 3.1 (Consistency of a n ). Suppose that either of the two work- 
ing models is true. Under Assumptions A1-A5, a n is a consistent estimator 
of the true coefficient ao- 
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We can further obtain the consistency of the nuisance parameters in a 
Sobolev-norm. 

Theorem 3.2 (Consistency of nuisance parameters). Suppose that ei- 
ther of the two working models is true. Under Assumptions A1-A5, 

\\X n (Y)-X (Y)\\ wl ^ {P) ^0, 

\\fn{U*\Y, V) - fu*{U*\Y, V)\\ w i l00 ( P ) A 0, 

\\g n {U*\T,V)- gu *{U*\T,V)\\ w i,oo {P) ^0. 

Here\\h(U*,Y,V)\\ W i,^ P) is defined as \\h(U* ,Y,V)\\ Loo{P) + \\Vh(U* ,Y,V)\\ Loo(P)> 
where P is the probability measure given by (U*,Y,V,R). 

The result in Theorem 3.2 can help to obtain a useful convergence rate 
of the estimators in L2-norm, which is stated in Theorem 3.3. 

Theorem 3.3 (Convergence rate). Suppose that either of the two work- 
ing models is true. Under Assumptions A1-A5, it holds that 

\&n ~ «o| 2 + \\k(Y) - AoOO||! 2(P) < O p (Jg) + °v (j=) > 
\\fn(U*\Y, V) - fy. (U*\Y, V)\\l 2{P) < O p (J^j + o p (-)=) 

and 

\\g n (U*\r, V) - gu * (U*\t, V)\\l 2{P) < O p (-^) + o p (J=) . 

Finally, we derive the asymptotic distribution for y / n(a n — ao). 

Theorem 3.4 (Asymptotic normality of a n ). Under Assumptions Al— 
A6, when either of the two working models is correct, y/n(a n — ao) is asymp- 
totically normal. Furthermore, when both working models are correct, the 
asymptotic variance of yjn(a n — ao) is the same as the semiparametric effi- 
ciency bound. 

3.3. Variance estimation. We propose the following steps to estimate 
the asymptotic variance of a n with no attempt to justify them rigorously. 
Our way is to directly estimate the influence function of a n . 

Define O = (Y, R, W) and define ip as the nuisance parameters consisting 
of (fu( 1 )(u\y,v),gu(^(u\T,v), \(y)). Let l(ip,a;~/) be the log-likelihood func- 
tion from a single observed statistic and let l a be the derivative of l(ijj,a;j) 
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with respect to a and be the differential operator of £(?/>, a; 7) with re- 
spect to tp. According to the proof of Theorem 3.4, there exists a function 
h(u,y,v) = (h 1 (u,y,v),h 2 (u,v),h 3 (y)) solving the equation l^[h] = l^l a , 
where is the dual operator of l^. Moreover, yfn(a n — oq) is shown to have 
the asymptotic variance 

(3.1) E[^ ,a ,Yr 1 ^(0^o,a , 1 o)+^o,a , 7o )S(0; 1 *)f 2 - 

Here, a, 7) = — P[L, a [/i] + l aa ] is the efficient information matrix for a 
for fixed 7, Q(0;ip, a, 7) =l a — l^[h] is the efficient score function for a for 
fixed 7, uj^,a, 1 ) = -{P[l i , a [h]+l aa ]}~ 1 P[V^mh}+l a )}, and S(0; 7*) is 
the influence function of j n . 

To estimate (3.1), we wish to estimate each of the four terms including 
£(V>o,ao,7*)> ^(O; ipo,a , 70), w(^ ,ao,7o) and S(0;j*). At first, we define 
a pseudo-profile likelihood function as pl n (a,j) =n~ 1 J2i=ih{'ip{oi,^),a;^), 
where k(-) is the value of /(•) at the ith observation and ?/>(q!, 7) is the 
argument of tp in the maximization of Section 2 when a and 7 are fixed. 
Then each of the four terms in (3.1) can be estimated using the following 
approach. 

First, since £(V>0) a 0)7*) is the semiparametric efficiency information for 
a in the likelihood function of (Y,R,U*,V) when assuming 7* is known, 
according to Murphy and van der Vaart (2000), we can estimate it by 

- pl n (a n + ^n,7n) - 2pl n (a n ,%) + pl n (a 

n &n i 7»i ) 

2-*n — 

E 

where e n is a constant of order n" 1 / 2 . 

Next, since ijj(a,j n ) maximizes P n l(a, ip; 7„), it holds that 

P n [l^(ip(a,%),a;%)[h}] =0 

for any tangent function h of ip. We differentiate the above equation with 
respect to a, then evaluate it at a n . This gives 

Pn loal> W (a», In) , "n J In) [h] = P n W $ ("n, In) , <V> In) [V a V> ("n, ln)M- 

When n goes to infinity, this equation approximates the equation which 
h(u,y,v) solves. Thus, we expect that V a '^(a n , 7„) «s h(u,y,v). As a re- 
sult, Q(Of, tpo, «0) 7*) ~ V a k(ip(&n, 7n)> a n ; 7n)> while the latter can be eval- 
uated using the numerical difference e~ 1 {li{ip{a n + e n ^ n )^a n + e n ;7n) — 
li$(& n ,%),a n ;%)}. 

Third, we define 0(7) as the estimate of a maximizing pl n (a, , j) when 
7 is held fixed. Using the argument similar to that in Zeng (2004), we 
can estimate u>(ipo, ao, 7*) by a vector Cj n with its jth element equal to 
£ n ej) otn) for the jth canonical base &j and E n satisfying s n — 
o(l) and ^fne n — ► 00. 
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Finally, S(0;j*) can be estimated by S n (0;j n ) using an explicit expres- 
sion given in Zeng (2004). 

Hence, the expression in (3.1) can be estimated by 



1 

-E 



A_ 1 Zi(^(a n + e n ) } d w + e n ;7„)-Zi('0,Q: ri ;7 ri ) A & , ^ 2 
^ n h u; n b n {L>i,j n ) 



4. Simulation study. A simulation study is conducted to illustrate our 
approach. In the simulation, for convenience of computation V is chosen to 
be a binary variable with equal probabilities. Conditional on V, the life- 
time T is generated from a proportional hazards regression model with haz- 
ard rate 3iexp{y}. One surrogate variable X\ is generated from the model 
X\ = (3qT + 0.59, where 6 is uniformly distributed in (—0.5,0.5) and 0q may 
take different values in the simulation study. The study end time, r, is cho- 
sen to be 1. Additionally, we generate another irrelevant covariate X<i from 
the uniform distribution in [0, 1] and generate the right-censoring time from 
a proportional hazards model with hazard rate 4exp{2Xi — 4^2 — 0.1V}. In 
other words, the simulation imitates the situation in which lifetime and cen- 
soring time are dependent and their dependence is explained by treatment 
V, a surrogate variable X\ and a censorship related variable Xi- 

According to our approach, the estimation of a is obtained by maximizing 
a pseudo-likelihood function over a sieve space, which is constructed simi- 
lar to Section 2.3, with the choice K n = 5 and m = 3 (other choices of K n 
and m have little effect on the results, but large K n significantly increases 
computation time). Since V is binary, for either value of V, rji(U,Y,V) is 
given as a linear combination of N™(U)N™(Y) and i]2(U,V) is given as a 
linear combination of N™(U). To prevent the parameters in the maximiza- 
tion from being unbounded, a penalty function, equal to 10~ 3 times the sum 
of squares of the spline coefficients, is subtracted from the pseudo-likelihood 
function. In the optimization, searching for the maximum starts from the 
initial values that a = 1 and all the spline coefficients are zero. Our simula- 
tions show that the optimum search usually converges within 10 iterations 
when either the search-move step or the norm of the search direction is small 
enough. 

The asymptotic variance of a n is estimated using the approach described 
in Section 3.3. Particularly, we choose e n = n~ 1 / 2 ,3n~ 1 / 2 ,§n~ l l 2 and e n = 
n -1 / 3 , 5?i -1 / 3 in evaluating S" 1 and Cj n . The results indicate that the es- 
timates of the variance are pretty robust to these choices. Thus, only the 
results from e n = n" 1 ! 2 and e n = n -1 / 3 are reported here. 

We choose (3q = or (3q = 1.5 in the simulation. When (3q = 0, the work- 
ing model for T is correct and the theoretical censoring rate is 18%; when 
00 = 1.5, the working model for T is misspecified and the theoretical censor- 
ing rate becomes 36%. Table 1 summarizes the results from 500 repetitions 
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with sample size n = 200 for these two choices. In the table, the first column 
gives the true value of the parameter (3$. The second column gives the work- 
ing models used in the estimation (e.g., T\V means that the working model 
for T is a proportional hazards model with V as independent variable) and 
the superscript star in the column list indicates that the indexed working 
model is misspecified. In the third column, we report the naive estimates of 
a by regressing T on V directly. The remaining columns in turn report the 
average estimates of & n , the standard errors of all the estimates, the median 
values of the estimated standard errors for a n and the coverage proportion 
of 95% confidence intervals based on the normal distribution approximation. 
Additionally, Figure 1 plots the histograms of a n from the simulations. 

The simulation results indicate that when either working model is correct, 
the estimates produce small bias and moreover, our variance estimation 
approach gives fairly accurate estimates and valid coverage probabilities. 
Specifically, when T is not fully predicted by V and the working model 
for the censorship is correct, our estimate has smaller bias than the naive 
estimate. The simulation also shows that using the correct working model 
for T may give a more efficient estimate. The amount of bias in a n observed 
in Table 1 can be due to the small sample size and the small K n , as well as 
the imprecise evaluation of the integral in the likelihood function. 

5. Discussion. For right-censored data, when the dependence between 
lifetime and censoring time is explained by many covariates, we utilize two 
working models to condense this high-dimensional information and thus 
derive the estimator of the treatment effect by maximizing some pseudo- 
likelihood function. We have shown that the estimator is consistent and 
asymptotically normal when either working model is correct. 

For simplicity, the working model for T given W given in Section 2 is 
assumed to be the same as T given V . This may seem very restrictive. How- 
ever, in practice any semiparametric model can be adopted as the working 
model for T given W. For example, suppose that we use a semiparametric 
model for T given W as follows: fT\w(y\ w ) = PiUi P Tw )'-> then the condensed 



Table 1 

Simulation results from 500 repetitions with sample size 200 





Working models 


Naive est. 


&n 


se(d n ) 


med(se) 


95% CI 





(T\V),(C\X 1 ,X 2 ,V) 


1.004 


0.974 


0.169 


0.172 


0.956 




(T\v),(c\x 2 ,vy 


1.004 


0.975 


0.169 


0.172 


0.954 


1.5 


(T\vy,(c\x 1 ,x 2 ,v) 


0.835 


0.915 


0.189 


0.234 


0.976 




(T\vy,(c\x 2 ,vy 


0.835 


0.802 


0.186 


0.208 


0.868 
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Fig. 1. Histograms of a n from 500 repetitions: (a) both working models for T and C are 
correct, (b) the working model for T is correct but the working model for C is misspecified, 
(c) the working model for T is misspecified but the working model for C is correct, (d) 
both working models are misspecified. 



information will include (U\ = P T W, U2 = ^W, V). Hence, the estimator of 
a can be derived by maximizing 

P n {Rlog[e aV X(Y)exp{-A(Y)e aV }f UltU2 (Ui,U2\Y,V)]} 



PJ (l-fl)log 



Y 



e aV X(s) exp{-A(s)e aV }fuuU 2 (Ui,U 2 \s, V) ds 



+ exp{-A(r)e Qy }/ f/l , c/2 (ft, U 2 \T > r, V) 



over a sieve space of the parameters (a, fui,U2( u i> u 2\T = y, v) fiii,U2( u ii u 2\T > 
r, v), \{y)), where U\ = p^W and U2 = 7^ W for some estimators (3 n ,7n- The 
slight difference from the previous context is that B-splines in the sieve space 
are constructed on a four-dimensional space. Consequently, under some reg- 
ular conditions, one of the following two conclusions is expected to be true: 
if the semiparametric working model for T given W does not satisfy the con- 
straint that h T \ v (y\v) = X(y)e a v , that is, the working model is misspecified, 
the consistency of a n holds if the working model for C given W is correct; 
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on the contrary, if the working model for T given W satisfies the constraint, 
the double robustness of a n given in Theorem 3.4 holds as well. However, it 
is often difficult to specify a correct working model for T given W satisfying 
the constraint (1.1) except in the simplest situation that T depends on W 
only via V, which has been used in this paper. 

Our approach can be easily extended to the situation when V is multi- 
dimensional and possibly discrete. If V is multidimensional, the sieve space 
needs to be constructed on a real space of all of U, Y and the multidi- 
mensional V. However, if V is discrete, the sieve space only needs to be 
constructed on a real space U and Y for each category of V . The latter has 
already been implemented in the simulation study. 

We acknowledge that our approaches are not easily generalized to the 
situation with a time-dependent component in X, since when X contains 
time-dependent covariates the condensed information using working models 
still depends on time, so its dimensionality is not reduced essentially. Further 
investigation is being conducted to solve this problem. 

6. Proofs. For convenience of writing, we assume r = 1 and denote G n 
as the empirical process y/n(P n — P). 

Proof of Theorem 3.1. The whole proof can be divided into three 
steps: first, we construct some functions in the sieve space which approximate 
the true parameters; then by using empirical process theory, we obtain one 
key inequality; finally, this inequality is used to obtain the consistency. 

Step 1. We construct some functions in S n (m,K n ,M n ) to approximate 
the true parameters. To do that, we need the following general result. From 
the properties of B-spline functions [cf. Schumaker (1981)], we can define a 
linear operator Q p mapping W^ fc '°°([0, l] p ) to the sieve space; that is, for any 
geW k '°°([0,l]P), 

m+Kn 

Qp[9]= E r hr .., ip [g]NT?( Xl )...N™(x p ), 

i lt ...,i p =l 

where r^...^ are the linear functionals in L oo ([0, l] p ). Moreover, 

m+Kn 

E |r«,..,Us]l <(2m + l)^ m " 1 )|b|| Loo([0il]P) , 

il,...,i p 

and according to Schumaker [(1981), Theorem 12.7], 

C(m) 

\\Qp[g\ - gWL^do^p) < K k \\9\\w k >°°{[o,i\py 
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Thus, we define r)i n (u,y,v) = Q 3 [log/c/*] - Q3 [log AH U=o , mn(u,v) = 
0.2 [log gu*\ ~ Q2[log5c/*]|n=o and £ n (y) = Qi[logA ]. Correspondingly, we 
obtain 

exp{r]i n (u,y,v)} exp{rj 2n (u,v)} 

f n (u\y,v) = , Q n (U\T. v) = — 1 , 

Jo exp{r/i n (n,y,v)}du J exp{?72n(«, v)\ du 

and A„(y) = exp{£ n (?/)}. As a result of the fact that E?=f"-^(ii) = ! 5 
(a ,£ n (y),r] ln (u,y,v),r] 2n (u,v)) is in the sieve space S n (m,K n ,M n ) and 
moreover, 

H/n-ZHwiO,!]*) <0(l)||l0g/^ - Q 3 [log/t/.]||L oo ( [ 0,l]3) <°(^fc 

and the same bound holds for \\g n - gu*\\L x ({o,i] 2 ) and [I A„ - A ||l oo ([ ,i])- 

Step 2. We obtain a key inequality using empirical process theory. To 
simplify the notation, for any functions f\{u,y,v), f 2 (u,v) and fz{y)i we 
denote G(r, /i , /2, /3, a; 7) as the likelihood function from one single observa- 
tion with parameters (a, / 3 , f±, / 2 ). Since (a n , A n , /„, maximizes P n [log G(i?, /1, / 2) 
/3 5 a;7n)] over the sieve space, it follows that 

P n [log G(R, f n ,g n ,X n ,a n ; %)} > P n [log G(R, f n , g n , X n , a ; %)}. 
Equivalently, 



(6.1) 



>P 



log 
log 



G(R, f n ,gn, X n ,a n ;%) 
G(R,fn,g n ,Xn,a ;%) 

G(R,f n ,gn,K,ao;in) 



+ p 



G(R, fu*,9u* ,Ao,ao;7*) 
G(R,fu*,gu*, A ,a ;7 

log 



G(R, f n ,g n ,Xn, &n ; 7n) 

where we recall that /[/* and are the conditional densities of J7* given 
(T, V) and (T >r,V), respectively. 

We want to bound the left-hand side of (6.1) using empirical process 
theory. For this purpose, we consider a class of functions C n defined by 

G(r,f n ,g n ,X n ,a;j n 



log 



G(r,f n ,g n ,\ n ,a Q ;%) 



■Xn(y) = 

fn(u\y,V 
9n(u\T,V 



My) 



exp{?7i(u,7/,z;)} 
Jq exp{fjx(u,y,v)} du 
exp{fj 2 (u,v)} 



fQexp{rj 2 {u,v)}du 
(a,i,fji,fj 2 ) S S n (m,K n ,M n ) 
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Since 11-^1%) llwio.i]) = 1> an Y function of f n ,g n , A n given in C n is bounded 
by 0(e 2M "). By Assumptions Al and A2, G(r, f n , g n , X n , ao; 7n ) is bounded 
away from 0. Hence, the class C n has an upper bound O p (M n ). More- 
over, this class can be regarded as the class of functions indexed by a, 
tiWEJS and which are the respective B- 

spline coefficients of fji, 772 and £ in S n (m,K n ,M n ). Tedious checking indi- 
cates that the function in C n is Lipschitz continuous with respect to all these 
parameters and the Lipschitz constant is bounded by O p (e eMn ). In addition, 
since i% is \, \T]f li2 \ and are bounded by M n and \a\ is bounded by M, 
they lie in a hypercube of a real space R Nn+1 where N n = (m + K n ) 3 + (m + 
K n ) 2 + m + -ff n . Therefore, for any e > 0, if we partition this hypercube 
into subcubes with scale length e, the total number of subcubes is at most 
0((M n / e) Nn ). According to the Lipschitz property of the functions in C n , 
the Loo-distance between any two functions of C n with respective indexes in 
the same subcube is no more than O p (e eMn )N n e. Consequently, we obtain 
that the bracketing number for C n satisfies Nri(O p (e n )N n e, C n , L^) < 
0(l)(M n /e) Nn . According to van der Vaart [(1998), Theorem 19.35], in prob- 
ability we have 



y/nE P \\P n -V\\ Cn <Op{\) J yiogl - 1 de 

<O p {l)Kl' 2 {\ogK n )Ml 

Thus, the left-hand side of inequality (6.1) is bounded by O p (M 2 Kn 2 \ogK n / y/n) 
from above. 

We denote the two terms in the right-hand side of (6.1) as (I) and (II) 
and wish to bound them from below. Since the functional G(-) is Lipschitz 
continuous with each component, we have that 

(I) > -O p (l){||/ n - /[HI I™ + \\dn - gu*\\L x + ||A„ - X \\ Loo + \% -7*1} 

On the other hand, by Schumaker [(1981), Theorem 4.22] \dN™(u)/du\ < 
0(K n ). We can easily verify that 

\G(R, f n ,g n , e, a n ; %) - G(R, f n ,g n , £, a n ; 7 *)| < 0(e 2M "M n K n )\% - 7 *| . 

Therefore, 

G(R,fu*,gu*Ao,a ;~/*) 



(II) > -O p (e 2M ")M n K n \% - 7 *| + P 



log- 



G(R,f n ,g n ,X n ,a n ;Y 
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However, the last term in the above is the Kulback-Leibler information. We 
linearize the last term. The first-order term in the expansion vanishes while 
the second-order term in the expansion is bounded from below by 

0(e- 3A/ ")||G(i?J^ s ^,Ao,ao;7*)-G(i?,/ n ,5n,A n ,a n ;7*)|ll 2(P) . 

Combining the above results and noting that the probability measure P 
is equivalent to the product measure of the Lebsgue measure in [0, l] 3 and 
the counting measure for {0, 1}, we obtain that for r = 0, 1, 

Q (1) ( e 5M "M n K n + + e 3M "M*K* /2 \ogK n 



(6.2) 



> / [G(r,fu*,gu*,XQ,ao;j*) 

J[0,1] 3 



G(r,f n ,g n , A n ,d n ;7*)] dudydv 



Step 3. We obtain the L2-convergence of the estimators. Suppose we 
select K n and M n such that they satisfy Assumption A5. Equation (6.2) 
implies that this upper bound holds for the square L2-distance between 
So So Su* , 9U* , A , «o; 7*) du dy and J S Jq 1 G(l, f n ,g n , K, a n ] 7*) du dy for 
any s £ [0, 1]. After simplification, we obtain that 

[ [exp{-e & " v A n ( S )} - exp{-e ao,; Ao( S )}] 2 ^ 

(6.3) 

■ e 3M n e ZM nK 3,/2 logKn 



< o p (i] 



n 



By choosing a subsequence, we suppose a n — > a* . From the above inequality 
and Assumption Al, a* = ao and A n (y) converges pointwise to Ao(y) for 
y € [0, 1]. Furthermore, since Ao is continuous, ||A(y) — Ao(y)||i 00 ([o,il) — > 0. 
This completes the proof of Theorem 3.1. □ 

Proof of Theorem 3.2. From the last inequality and Assumption Al, 
we immediately obtain that 

e 3M„ e 6Mn Kn /2 logK n 



Idn-aol <O p (l)^- w + -j= 



After repeating using (6.2) for R = 1 and R = 0, we can further obtain 
that the same bound holds for ||A n — ] I x, 2 (-P) ' H^ n — Su*\\\ 2 (p) an d \\<jn — 



9U*\\l 2{P y 
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On the other hand, from Schumaker [(1981), Theorem 4.22] and Assump- 
tion A3, we have that 

m+K n 

\\V k u ^V k v ^ ln (u,y,v)\\ Loo{P) <CK k J2 \Vh,i 2 ,i s \<0(M n K*), 

«l,«2,i3 = l 

where h+k 2 + k 3 = k. Thus, \\V k u ^V^ f n (u\y, v)\\ Loc{P) < Ce^ k+1 ) M -M n K k . 
According to the Sobolev interpolation inequality [cf. Adams (1975)], we ob- 
tain that 

3M n c 6M n 7^3/2 



|V(/ n - A/OIIwp) < Ce^ M ^K^ ( + 



e 6M nR V2 l R , (l_ n) / 2 



n 



where t\ = By the choice of K n and M n in Assumption A5, ||V(/ n , — 
fu*)\\L x (P) converges to zero. Similarly, this is true for g n and A n . Thus, 
Theorem 3.2 holds. □ 



Proof of Theorem 3.3. Using the results from Theorems 3.1 and 
3.2, redo the proof of Theorem 3.1. We define C n as a class as before, but 
the functions in C n are indexed by (a,£, A/( 7 ),ff[/( 7 ))j which belongs to a 
bounded set in R x {W 1,co (P)} 3 . Thus, C n has a bounded covering function 
and the integration of the entropy for the class C n is finite. Moreover, the 
function in the left-hand side of (6.1) converges to zero uniformly. Thus, we 
can apply Theorem 2.11.23 of van der Vaart and Wellner (1996), to obtain 
that the left-hand side of inequality (6.1) is bounded by o p (l/- v /n). For 
the right-hand side of (6.1), we still perform Taylor expansion at the true 
parameters. Since each parameter is in a small neighborhood of the true 
parameters, the right-hand side of (6.1) is bounded from below by 

-O p {\% -7*| 2 + \\f n - fu*\\L 2 (P) + \\9n ~ 9U*\\ 2 L 2 {P) + \\\n, ~ <M| a (p)} 

+ O p (l)\\G{R,fu*,9u*, Ao,«o;7*) - G(R, f n , g n ,X n ,a n ;j*)\\ 2 L2{P) . 

Recall the construction of f n , g n and £ n in the first step of proving Theorem 
3.1; we obtain that 

Qp(l) , Qp(l) 

> \\G(R,fu*,gu*Ao,a ;i*) - G(R,f n ,g n , A n ,a n ;7*)||| 2(P) . 

The results of Theorem 3.3 thus follow from the same arguments as in the 
proof of Theorem 3.2. □ 

Proof of Theorem 3.4. We will write y/n(& n -ao) as a linear func- 
tional of the empirical process G n . The whole proof can be divided into the 
following five steps. 
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Step 1. We define a pseudo least favorable direction for ao when 7* is 
known. The nuisance parameters for a are (fjj* ,gjj* , A) and are denoted as 
ip. The tangent space for ip is thus given by 

lh(u,y,v) = (h 1 (u,y,v),h 2 (u,v),h 3 (y)): [ h 1 (u,y,v)du = 0, 



h 2 (u,v)du = 0, h(u,y,v) e L 2 {[0,iy 

Let l(ip,a;^*) be as defined in Section 3.3. Then a pseudo least favorable 
direction for ao is defined as a tangent function h(u,y,v) 6 H for ip that 
satisfies 

{ip , «o ; 7* )h (V>o , «o ; 7* ) M = ^ C0o, a ; 7*% (^0, «o; 7*) a.s. , 
where l${ipo,OiQ;'y*)[h\ is the derivative of /(•) with respect to ip along the 
direction h, l^(ipo, ao! 7*) is the adjoint operator of l^(ipo,ao;j*) in the 
Hilbert space L 2 (P) and l a (ipo,ao;j*) is the derivative of I respective to a. 

Step 2. We prove the existence of the pseudo least favorable direction. 
We note that H is a Hilbert space with {h,h)n given by 

\\hi(u,y,v)\\l 2m]3) + ||/te( u »«)ll£ a ([o,i] a ) + I \h 3 (y) || 2 L2 ([0)1]) . 
Then the following lemma holds. 

Lemma 6.1. Under Assumptions A1-A4, there exists a unique h 6 H 
such that 

^(^o,ao;7*)^(^o,a ;7*)M =^(V , o,a ;7*)^(V'o,ao;7*) 

Proof. Define j4 as a linear operator from L 2 ([0, 1]) to L2([0, l] 2 ), given 
by A[h 3 ] = —e a ° v h 3 (s) ds + h 3 (y)/\o(y). After some calculation and using 
the property Jq 1 h\ (u, y, v) du = 0, we have 

IIWo,a ;7*)[^ll! 2 (p) 

h!(U*,Y,V)^ 



> 



R 



+ 



A\h 3 ] + 



fu*(U*\Y,V) 



l-R)I(Y = r) 



L 2 {P) 



h 2 (U*,V) 



gu*{U*\r,V) 



L2(P) 



>o(i) 



[0,1] 



A[h 3 ] 2 dydv + ||/n||l 2 ([o,i] 3 



+ \\h 



2Hl 2 ([o,i] 2 )- 



Since A is invertible, ||^4[/i3|| > \\A~ \\~ \\h 3 \\. Thus, the last term is bounded 
from below by 0(l){h,h) h . By the Lax-Milgram theorem [Evans (1998)], 
the operator ll(ipo, ao! 7*)^>(V'0j ao;7*) is invertible. Lemma 6.1 is proved. 

□ 



20 



D. ZENG 



Step 3. The proof for the smoothness of the least favorable direction is 
technical so we leave it to one of our technical reports, which is available 
from the author. There we show: 

Lemma 6.2. Under Assumptions A1-A4, h(u,y,v) G W k,co (P). 

Step 4. We construct the projection of h(u,y,v) on the tangent space of 
the sieve space. First, by simple computation, the tangent vectors h n (u,y,v) 
for the nuisance parameters at tp n = (f n (u\y,v),g n (u\T,v),X n (y)) have the 
form 

fn(u\y,v)Ci(u,y,v) - f n {u\y,v)-^ -j — — — , 

Jo exp{r]i n (u,y,v)\du 

* ( I s c / \ * / I s Jp 1 expfen (u, v) 16 (u, v) du » \ 

Pn(u|r, -SnHr.u)- 42 — -j — — — ,A n (y)6(y) , 

J exp{r]2n{u,v)}du J 

where £i(u, y, v), ^(u, v) and £3(2/) have the same forms as r]i(u, y, v), 772(1*, y, v) 
and £(y) in the sieve space. Then, one good approximation to the pseudo 
least favorable direction is to choose h n (u,y,v) = (h™, , h§) so that their 
corresponding (€i(u,y,v),t 2 (u,v),€ 3 (y)) satisfy £1 (it, y, v) = Qz[h\/fu*] ~ Qz[hi /fu*] \ u=o, 
&(u,v) = Q,2\hilgu*\ ~ Q,i\hil gu*\\u=v and £ 3 (y) = Qi[/i 3 /A ]. Here the op- 
erator Q p was defined in the proof of Theorem 3.1. Thus, the results in 
Theorem 3.3 and Lemma 6.2 imply that 

\\h n (U\Y,V)-h(U^Y,V)\\l {P) <0^^o p ^y 

Step 5. We derive the empirical process for y/n(a n — cto). Since (i/j n ,a n ) 
maximizes the log-likelihood in the sieve space, the score along the path 
(a + e, ip + eh n ) is zero when e = 0. Then it holds that 

G n { l it $ n , a n ; %) [h n ] + l a ($ n , a n ; %) } 

= -VnP{l^(^ n ,a n ;%)[h n ] + la(i>n,a n ]%)}. 

For the left-hand side of the above equation, we apply Theorem 2.11.23 of 
van der Vaart and Wellner (1996). Note that the function in the left-hand 
side, indexed by both (i> n ,h n ) G W 1 ' 00 and (& n ,%) G [-M,M] d+2 , belongs 
to a P-Donsker class. Moreover, we linearize the right-hand side at the true 
parameters and approximate h n by h. Since Pj^^cb «o; l*)Wn ~ ^0,^] + 
^ai/i('0o>«o;7*)[VVi — ^0]} = 0) we obtain that 
-P{l^ a (ipo,a ;^*)[h] + l aa (ipo,a ;~f*)}y/n(a n - a ) 
= G n {l^(ip 0} a ;'y*)[h] + l a (tpo, a ; 7*)} 
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+ P{^ 7 (^o,ao;7*)N +^ 7 (^o,ao;7*)}\ / ™(7n -7*) 

+ VnO p (\\tjj n - ^o||i a (p) + l« - a \ 2 + \\h n - h\\\ 2{P) + \% ~ 7*| 2 )- 

The last term is o p (l) from Theorem 3.3 and Assumption A6. Hence, the 
asymptotic normality of \fn{a n — ao) holds if we can prove the following 
lemma. 

Lemma 6.3. -P{lf a (^o,a ;j*)[h] +^a(^o,ao;7*)} > 0. 

PROOF. We note that 

- P{ (tp , a ; 7* ) [h] + l aa (ip , a ; 7* ) } 

= P{l a (ip ,a ;j*) + l^{ipo, a ;i*)[h]} 2 > 0. 
Moreover, if l a (ipo,ao;j*) + l^(ipo, ao', 7*)[h] is zero, then for R = l, 



Let Y = 0. Multiply both sides by fu*, then integrate both sides over U* 
from 0(7*) to 6(7*). We have V = -/i 3 (0)/A (0). This is a contradiction. □ 

Furthermore, we obtain the influence function of a n to be 

- {P [l^a {tp Q , a ; 7* ) [h] + l aa (tpo , a ; 7* ) ] } " 1 

x {l^(ipo,a ;j*)[h] +/ Q Oo,a ;7*) 

+ P[l^{ip , a ; 7*) [h] + l aj (ipo, a ; 1*)]S(Y, R, W; 7*)}, 

where S(Y, R,W;j*) is the influence function of 7 n . When both working 
models are correct, l(rfo, oto; 7) is always the logarithm of the density for (T A 
C, R, £7(7), V) whatever value 7 takes. So P[Z^(^o 5 ao;7)[^] + lafyo, ao; 7)] 
is an expectation of a score function; thus, it is equal to 0. This implies 
P[^ 7 (-i/'0)Q : o;7*)[^] + ^(V'Cb a o; 7*)] =0. Hence, a n has an influence func- 
tion equal to -{P[^ a (^o,«o;7*)N + Lai^o, a ; 7*)]}~ 1 {^(^o, a ; 7*) [h] + 
l a (ipo, ao; 7*)}, which is exactly the efficient influence function for a. Con- 
sequently, the asymptotic variance of ^/n{a n — ao) is equal to the semipara- 
metric efficiency bound. □ 
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